Bad Hash Function

h(k) = k % 16

Good Hash Function

h(k) = ((k >> 4) ^ k) % 16

Hash Function Properties

A good hash function spreads keys to avoid clustering, but collisions are still inevitable.

- Uniform Distribution: Each slot should be roughly equally likely.

- Fast & Deterministic: Cheap to compute, same input → same output.

- Inevitable Collisions: Finite buckets vs (often) huge key space.

Demonstration

Select an input sequence and insert keys.

Execution Log:

Why No Perfect Hash Exists (Generally)

The "Good Hash" uses bitwise mixing to decorrelate patterned inputs (e.g., multiples of 16). Bit mixing (shift + XOR) helps produce more varied lower bits before the modulo.

  • k >> 4: Pulls higher bits downward.
  • ^ (XOR): Combines patterns to break simple progressions.
  • % 16: Final bucket selection.

Inevitable Collisions & What To Do:

  • Pigeonhole Principle: More possible keys than buckets ⇒ collisions must occur.
  • No universally "best" hash: Depends on workload.
  • Resize / more buckets: Lowers load factor α = n / m but costs a full rehash.
  • Collision handling required:
    • Separate chaining
    • Open addressing (linear / quadratic / double hashing)
  • Policy: Resize when α exceeds threshold (e.g., 0.7).

Takeaway: Distribute well, keep load reasonable, still handle collisions.